Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web
نویسنده
چکیده
Numeric data such as statistics and from sensors are increasingly published on the Web and – if brought together for comparisons and calculations – can answer important questions in science, industry, and politics. For instance, natural scientists compare rainfall values from sensors with hydrological estimations documented in a semantic wiki; financial analysts evaluate companies based on comparisons between KPIs from balance sheets filed with the SEC and daily stock market values from Yahoo! Finance; and citizens want to explore the GDP per Capita of different countries, independent from information sources such as Eurostat, the IMF, and the World Bank. However, the integration of datasets for analysis is difficult. First, heterogeneities remain a problem since publishers use different dimensions to describe numeric data, several identifiers for common entities, as well as differing levels of detail, units, and formulas. Second, aggregation and filtering operations, a varying selectivity of queries, and chains of joins over a growing number of possibly large datasets together with background information from the Web render analytical queries more complex than typical data analysis settings. The broad acceptance of the RDF Data Cube Vocabulary (QB) for publishing multidimensional datasets and of Online Analytical Processing (OLAP) interfaces for intuitive and interactive knowledge discovery call for a uniform view – the Global Cube – over available numeric data for exploratory analysis. This work presents four complementary contributions to query the global cube: 1. A mapping between the common model of data cubes and QB to use existing OLAP engines for efficient queries over datasets from the Web. 2. An algorithm to evaluate OLAP operations over data cubes using SPARQL queries over QB datasets for more flexible data integration based on RDF stores. 3. A method to optimise analytical query processing via materialised aggregate views in RDF, including an evaluation with a realistic benchmark. 4. A method to declaratively describe complex relationships between datasets for flexibly increasing the number of answers from the global cube. The contributions are applied to three scenarios in the areas of water resources management, company performance analysis, and Open Government Data exploration.
منابع مشابه
Adaptive Information Analysis in Higher Education Institutes
Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...
متن کاملAdaptive Information Analysis in Higher Education Institutes
Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...
متن کاملMulti Objective Scheduling of Utility-scale Energy Storages and Demand Response Programs Portfolio for Grid Integration of Wind Power
Increasing the penetration of variable wind generation in power systems has created some new challenges in the power system operation. In such a situation, the inclusion of flexible resources which have the potential of facilitating wind power integration is necessary. Demand response (DR) programs and emerging utility-scale energy storages (ESs) are known as two powerful flexible tools that ca...
متن کاملEfficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text
People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کامل